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Abstract. mv3 is a new word based stream cipher for encrypting long 
streams of data. A direct adaptation of a byte based cipher such as RC4 
into a 32- or 64-bit word version will obviously need vast amounts of 
memory. This scaling issue necessitates a look for new components and 
principles, as well as mathematical analysis to justify their use. Our ap- 
proach, like RC4's, is based on rapidly mixing random walks on directed 
graphs (that is, walks which reach a random state quickly, from any 
starting point). We begin with some well understood walks, and then in- 
troduce nonlinearity in their steps in order to improve security and show 
long term statistical correlations are negligible. To minimize the short 
term correlations, as well as to deter attacks using equations involving 
successive outputs, we provide a method for sequencing the outputs de- 
rived from the walk using three revolving buffers. The cipher is fast — 
it runs at a speed of less than 5 cycles per byte on a Pentium IV pro- 
cessor. A word based cipher needs to output more bits per step, which 
exposes more correlations for attacks. Moreover we seek simplicity of 
construction and transparent analysis. To meet these requirements, we 
use a larger state and claim security corresponding to only a fraction of 
it. Our design is for an adequately secure word-based cipher; our very 
preliminary estimate puts the security close to exhaustive search for keys 
of size < 256 bits. 
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1 Introduction 

Stream ciphers are widely used and essential in practical cryptography. Most are 
custom designed, e.g. alleged RC4 |Sch951 Ch. 16], SEAL |RC98| . SCREAM |HCJ02j . 
and LFSR-based NESSIE submissions such as lili-128, snow, and SOBER |P+03[ 
Ch. 3]. The VRA cipher |ARV95j has many provable properties, but requires 
more memory than the rest. We propose some new components and principles 
for stream cipher design, as well as their mathematical analysis, and present a 
concrete stream cipher called MV3. 

To motivate our construction, we begin by considering RC4 in detail. It is an 
exceptionally short, byte-based algorithm that uses only 256 bytes of memory. 
It is based on random walks (card shuffles), and has no serious attacks. Modern 
personal computers are evolving from 32 to 64 bit words, while a growing number 
of smaller devices have different constraints on their word and memory sizes. 
Thus one may desire ciphers better suited to their architectures, and seek designs 
that scale nicely across these sizes. Here we focus on scaling up such random walk 
based ciphers. Clearly, a direct adaptation of RC4 would require vast amounts 
of memory. 

The security properties of most stream ciphers are not based on some hard 
problem (e.g., as RSA is based on factoring). One would expect this to be the case 
in the foreseeable future. Nevertheless, they use components that - to varying 
degrees - are analyzable in some idealized sense. This analysis typically involves 
simple statistical parameters such as cycle length and mixing time. For example, 
one idealizes each iteration of the main loop of RC4 as a step in a random walk 
over its state space. This can be modeled by a graph G with nodes consisting 
of 5*256, the permutations on 256 objects, and edges connecting nodes that dif- 
fer by a transposition. Thus far no serious deviations from the random walk 
assumptions are known. Since storing an element of 6*232 or 6*264 is out of the 
question, one may try simulations using smaller permutations; however, this is 
nontrivial if we desire both competitive speeds and a clear analysis. It therefore 
is attractive to consider other options for the underlying graph G. 

One of the most important parameters of RC4 is its mixing time. This denotes 
the number of steps one needs to start from an arbitrary state and achieve 
uniform distribution over the state space through a sequence of independent 
random moves. This parameter is typically not easy to determine. Moreover, RC4 
keeps a loop counter that is incremented modulo 256, which introduces a memory 
over 256 steps. Thus its steps are not even Markovian (where a move from the 
current state is independent of earlier ones). Nevertheless, the independence of 
moves has been a helpful idealization (perhaps similar to Shannon's random 
permutation model for block ciphers), which we will also adhere to. 

We identify and focus on the following problems: 

• Problem 1 — Graph Design. How to design graphs whose random walks 
are suitable for stream ciphers that work on arbitrary word sizes. 

• Problem 2 — Extraction. How to extract bits to output from (the labels 
of) the nodes visited by walk. 
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• Problem 3 — Sequencing. How to sequence the nodes visited by the walk 
so as to diminish any attacks that use relationships (e.g. equations) between 
successive outputs. 

We now expand on these issues. At the outset, it is important to point out 
the desirability of simple register operations, such as additions, multiplications, 
shifts, and xor's. These are crucial for fast implementation, and preclude us 
from using many existing constructions of expander graphs (such as those in 
LPS86 HLW06 ). Thus part of the cipher design involves new mathematical 
proofs and constructions. The presentation of the cipher does not require these 
details, which may be found in Appendix 1X1 

High level Design Principles: Clearly, a word based cipher has to output 
more bits per step of the algorithm. But this exposes more relationships on the 
output sequence, and to mitigate its effect we increase the state size and aim 
at security that is only a fraction of the log of the state size. We also tried to 
keep our analysis as transparent and construction as simple as possible. Our 
key initialization is a bit bulky and in some applications may require further 
simplifications, a topic for future research. 

1.1 Graph Design: Statistical properties and Non-linearities 

In the graph design, one wants to keep the mixing time r small as a way to keep 
the long term correlations negligible. This is because many important properties 
are guaranteed for walks that are longer than r. For example, such a walk visits 
any given set S nearly the expected number of times, with exponentially small 
deviations (see Theorem lA.2|) . A corollary of this fact is that each output bit is 
unbiased. 

Thus one desires the optimal mixing time, which is on the order of log N, 
N being the size of the underlying state space. Graphs with this property have 
been well studied, but the requirements for stream ciphers are more complicated, 
and we are not aware of any work that focuses on this issue. For example, the 
graphs whose nodes are Z/2"Z (respectively (Z/2 n Z)*) and edges are (x, x + gi) 
(respectively (x,x ■ gi)), where gi are randomly chosen and i = 0{n), have this 
property |AR94j . While these graphs are clearly very efficient to implement, 
their commutative operations are quite linear and hence the attacks mentioned 
in Problem 3 above can be effective. 

To this end, we introduce some nonlinearities into our graphs. For example, 
in the graph on Z/2™Z from the previous paragraph, we can also add edges of 
the form (x, hx) or {x, x r ). This intuitively allows for more randomness, as well 
as disrupting relations between successive outputs. However, one still needs to 
prove that the mixing time of such a modified graph is still small. Typically this 
type of analysis is hard to come by, and in fact was previously believed to be 
false. However, we are able to give rigorous proofs in some cases, and empirically 
found the numerical evidence to be stronger yet in the other cases. More details 
can be found in the Appendices. 
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Mixing up the random walks on multiplicative and additive abelian groups 
offers a principled way to combine with nonlinearities for an effective defense. 
As a practical matter, it is necessary to ensure that our (asymptotic) analysis 
applies when parameters are small, which we have verified experimentally. 

We remark here that introduction of nonlinearities was the main motivation 
behind the construction of the T-functions of Klimov and Shamir ( KS02 ) . They 
showed that the walk generated by a T-function deterministically visits every 
n-bit number once before repeating. A random walk does not go through all the 
nodes in the graph, but the probability that it returns to a previous node in m 
steps tends to the uniform probability at a rate that drops exponentially in m. 
It also allows us to analyze the statistical properties as indicated above. (See 
Appendix 1X1 for more background.) 

1.2 Extraction 

Obviously, if the nodes are visited truly randomly, one can simply output the 
lsb's of the node, and extraction is trivial. But when there are correlations 
among them, one can base an attack on studying equations involving successive 
outputs. One solution to this problem is to simultaneously hash a number of 
successive nodes using a suitable hashing function, but this will be expensive 
since the hash function has to work on very long inputs. 

Our solution to the sequencing problem below allows us to instead hash a 
linear combination of the nodes in a faster way. A new aspect of our construc- 
tion is that our hash function itself evolves on a random walk principle. We 
apply suitable rotations on the node labels (to alter the internal states) at the 
extraction step to ensure the top and bottom half of the words mix well. 

1.3 Sequencing 

As we just mentioned, the sequencing problem becomes significant if we wish 
to hash more bits to the output (in comparison to RC4). First we ensure that 
our graph is directed and has no short cycles. But this by itself is insufficient, 
since nodes visited at steps in an interval [t,t + A], where A <C r, can have 
strong correlations. Also, we wish to maximize the number of terms required 
in equations involved in the attacks mentioned in Problem 3. To this end, we 
store a short sequence of nodes visited by the walk in buffers, and sequence them 
properly. The buffers ensure that any relation among output bits is translated 
to a relation involving many nonconsecutive bits of the internal state. Hence, 
such relations cannot be used to mount efficient attacks on the internal state of 
the cipher. 

The study of such designs appear to be of independent interest. We are able to 
justify their reduction of correlations via a theorem of CHJ02 (see Section l4~K|) . 

1.4 Analysis and Performance 

We do not have a full analysis of the exact cipher that is implemented. How- 
ever, we have ensured that our idealizations are in line with the ones that allow 
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RC4 be viewed via random walks. Of course some degree of idealization is nec- 
essary because random bits are required to implement any random walk; here 
our design resembles that of alleged RC4 |Sch95l Ch. 16]. Likewise, our cipher 
involves combining steps from different, independent random walks on the same 
underlying graph. We are able to separately analyze these processes, but al- 
though combining such steps should intuitively only enhance randomness, our 
exact mathematical models hold only for these separate components and hence 
we performed numerical tests as well. 

Our cipher MV3 is fast on 32 bit processors — it runs at a speed of 4.8 cycles 
a byte on Pentium IV, while the speed of RC4 is about 10 cycles a byte. Only 
two of the eSTREAM candidates [DC06 are faster on similar architecture. 

We evaluated it against some known attacks and we present the details in 
Section 0] We note that some of the guess-and-determine attacks against RC4 
(e.g. |K+98| ) are also applicable against mv3. However, the large size of the 
internal state of mv3 makes these attacks much slower than exhaustive key 
search, even for very long keys. 

The security claim of mv3 is that no attack faster than exhaustive key search 
can be mounted for keys of length up to 256 bits. 1 

The paper is organized as follows: In Section|3we give a description of mv3. 
Section |3 contains the design rationale of the cipher. In Section 0] we examine 
the security of MV3 with respect to various methods of cryptanalysis. Finally, 
Section [S] summarizes the paper. We have also included appendices giving some 
mathematical and historical background. 

2 The Cipher MV3 

In this section we describe the cipher algorithm and its basic ingredients. The 
letters in its name stand for "multi- vector" , and the number refers to the three 
revolving buffers that the cipher is based upon. 

Internal state. The main components of the internal state of mv3 are three 
revolving buffers A, B, and C of length 32 double words (unsigned 32-bit inte- 
gers) each and a table T that consists of 256 double words. Additionally, there 
are publicly known indices i and u (i € [0...31], u 6 [0...255]), and secret 
indices j, c, and x (c,x are double words, j is an unsigned byte). 

Every 32 steps the buffers shift to the left: A <— B, B <— C, and C is emptied. 
In code, only the pointers get reassigned (hence the name "revolving" , since the 
buffers are circularly rotated). 

Updates. The internal state of the cipher gets constantly updated by means 
of pseudo-random walks. Table T gets refreshed one entry every 32 steps, via 
application of the following two operations: 

u <— u + 1 

T[u] «- T[u] + (T\j] ^ 13). 

1 Note that mv3 supports various key sizes of up to 8192 bits. However, the security 
claims are only for keys of size up to 256 bits. 
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(Symbol x ^> a means a circular rotation to the right of the double word x by 
a bits). 

In other words, the u-th element of the table, where u sweeps through the 
table in a round-robin fashion, gets updated using T[j]. 

In its turn, index j walks (in every step, which can be idealized as a random 
walk) as follows: 

j^j +(B[t\ mod 256), 

where i is the index of the loop. Index j is also used to update x: 

x <- x + T\j], 

which is used to fill buffer C by C[i] <— (x ;s§> 8). 

Also, every 32 steps the multiplier c is additively and multiplicatively re- 
freshed as follows: 

c «- c + (A[0] ^> 16) 
c^cVl 

c <— c 2 (can be replaced by c <— c 3 ) 

Main loop. The last ingredient of the cipher (except for the key setup) is 
the instruction for producing the output. This instruction takes the following 
form: 

output: (x-c) © A[9i + 5] © (B[7i + 18] ^> 16). 

The product x ■ c of two 32-bit numbers is taken modulo 2 32 . 

Putting it all together, the main loop of the cipher is the following: 

Input: length len 
Output: stream of length len 
repeat /en/32 times 
for i = to 31 

j <- j + mod 256) 
x «- a: + T[j] 
C7[i] <-(x»8) 

output (x-c) © A[9i + 5] © (B[7i + 18] ^> 16) 
end for 

U <— M + 1 

TM <- T[«] + (T[j] »> 13) 
c^c+ [A[0] ^> 16) 
c^cVl 

c <— c 2 (can be replaced by c<- c 3 ) 
A <- B, B <- C 
end repeat 
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Key initialization. 

The key initialization algorithm accepts as inputs a key K of length keylength, 
which can be any multiple of 32 less than or equal to 8192 (we recommend at 
least 96 bits), and an initial vector IV of the same length as the key. The key 
remains the same throughout the entire encryption session, though the initial 
vector changes occasionally. The initial vector is publicly known, but should 
not be easily predictable. For example, it is possible to start with a "random" 
IV using a (possibly insecure) pseudo-random number generator known to the 
attacker, and then increment the IV by 1 every time (see Section l4~2*f . 

The key initialization algorithm is the following: 

Input: key key and initial vector IV, both of length keylength double words 
Output: internal state that depends on the key and the IV 

j, x,u^-0 
Cf- 1 

fill A,B,C,T with OxEF 
for i = to 3 

for I = to 255 

T[i + T[i + l} + (key[l mod keylength] 3S> 8i) + I. 

end for 

produce 1024 bytes of mv3 output 

encrypt T with the resulting key stream 
end for 
for i = 4 to 7 

for I = to 255 

T[i + Z] <- T[i + l} + (IV[l mod keylength] ^> 8i) + I. 

end for 

produce 1024 bytes of mv3 output 
encrypt T with the resulting key stream 
end for 

Note that when only the IV is changed, only the second half of the key 
initialization is performed. 

3 Design Rationale 

In this section we describe more of the motivating principles behind the new 
cipher. 

Internal state. The internal state of the cipher has a huge size of more 
than 11,000 bits. This makes guess-and-determine attacks on it (like the attack 
against RC4 in |K+98| ) much slower than exhaustive key search, even for very 
long keys. In addition, it also secures the cipher from time/memory tradeoff 
attacks trying to invert the function / : State — > Output, even for large key 
sizes. More detail on the security of the cipher with respect to these attacks 
appears in Section 0] 
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The buffers A, B, C and table T, as well as the indices j, c, and x should 
never be exposed. Since the key stream is available to the attacker and depends 
on this secret information, the cipher strictly adheres to the following design 
principles: 

Principle 1. Output words must depend on as many secret words as possible. 
Principle 2. Retire information faster than the adversary can exploit it. 

As the main vehicle towards these goals, we use random walks (or, more 
precisely, pseudo-random walks, as the cipher is fully deterministic). 

Updates. The updates of the internal state are based on several simultane- 
ously applied random walks. On the one hand, these updates are very simple and 
can be efficiently implemented. On the other hand, as shown in Appendix^ the 
update mechanism allows one to mathematically prove some randomness prop- 
erties of the sequence of internal states. Note that the random walks are inter- 
leaved, and the randomness of each one of them relies on the randomness of the 
others. Note also that the updates use addition in Z/2 ra Z and not a bitwise XOR 
operation. This partially resolves the problem of high-probability short correla- 
tions in random walks: In an undirected random walk, there is a high probability 
that after a short number of steps the state returns to a previous state, while 
in a directed random walk this phenomenon does not exist. For example, if we 
would use an update rule x <— x © T\j], then with probability 2~ 8 (rather than 
the trivial 2 -32 ) x would return to the same value after two steps. The usage of 
addition, which unlike XOR is not an involution, prevents this property. However, 
in the security proof for the idealized model we use the undirected case, since the 
known proofs of rapid mixing (like the theorem of Alon and Roichman AR94 ) 
refer to that case. 

Introducing nonlinearity. In order to introduce some nonlinearity we use 
a multiplier c that affects the cipher output in a multiplicative way. The value of 
c is updated using an expander graph which involves both addition and multi- 
plication, as explained in Appendix^ It is far from clear the squaring or cubing 
operation still leaves the mixing time small and our theorem addresses this. 

Our update of c involves a step c <— cVl. This operation may at a first seem 
odd, since it leaks LSB(c) to attacker, who may use it for a distinguishing attack 
based only on the LSB of outputs, ignoring c entirely. However, this operation is 
essential, since otherwise the attacker can exploit cases where c = 0, which occur 
with a relatively high probability of due to the c <— c 2 operation (and last 
for 32 steps at a time). In this situation, they can disregard the term x ■ c and 
devise a guess-and-determine attack with a much lower time complexity than 
the currently possible one. 

Sequencing rule. The goals of this step were explained in section 1.3. Our 
output rule is based on the following general structure: The underlying walk 
Xq, Xi, . . . , x n , . . . is transformed into the output yo, yi, . . . , y n , . . . via a linear 
transformation: 

yi = x nn © x n i2 ffi ' ' ' ffi %mk • 
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Without loss of generality, we assume that the indices are sorted nn < rii2 < 
■ ■ ■ < Tiik- Let J\f — {n.ij}. The set AT is chosen to optimize the following param- 
eters: 

1. Minimize the latency and the buffer size required to compute yi. To this end, 
we require that there will be two constants m and C, between 64 and 256, 
such that i — C < riij < i for each i > m and 1 < j < k. We additionally 
constrain — i for all i > m; 

2. Maximize the minimal size of a set of pairs Xj,Xj+i that can be expressed 
as a linear combination of y's. More precisely, we seek to maximize a such 
that the following holds for some ji, ■ ■ ■ ,jb > m and i%, . . . , i a : 

(x il ®Xi 1+ x)®(xi 2 ®Xi 2+ i)®---®(xi a ®Xi a+ x) = y n ®yj 2 ®- ■ -®Vj b - (3.1) 

Notice that the value of b has not been constrained, since usually this value 
is not too high and the attacker can obtain the required data. 

Intuitively speaking, the second constraint ensures that if the smallest feasible 
a is large enough, no linear properties of the x walk propagate to the y walk. 
Indeed, any linear function on the y walk can be expressed as a function on the 
x walk. Since the x walk is memoryless, any linear function on a subset of x's 
can be written as a XOR of linear functions on the intervals of the walk. Each 
such interval can in turn be broken down as a sum of pairs. If a is large enough, 
no linear function can be a good distinguisher. Note that we concentrate on the 
relation between consecutive values of the state x, since in a directed random 
walk such pairs of states seem to be the most correlated ones. 

Constructing the set TV can be greatly simplified if M has periodic structure. 
Experiments demonstrate that for sequences with period 32 and k — 3, a can be 
as large as 12. Moreover, the best sequences have a highly regular structure, such 
as rii\ = i — (5k mod 16) and rii% = i — 16 — (3k mod 16), where k = i mod 16. 
For larger periods a cannot be computed directly; an analytical approach is 
desirable. 

As soon as the set of indices is fixed, yi for i > m can be output once Xi 
becomes available. The size of the buffer should be at least i — n,j for any i > m 
and j. If TV is periodic, retiring older elements can be trivially implemented 
by keeping several buffers and rotating between them. We note that somewhat 
similar buffers where used recently in the design of the stream cipher Py BS05 . 

More precisely, if we choose the period P = 32 and k = 3, i.e. every output 
element is an XOR of three elements of the walk, the output rule can be imple- 
mented by keeping three P-word buffers, A, B, and C. Their content is shifted 
to the left every P cycles: A is discarded, B moves to A, and C moves to B. The 
last operation can be efficiently implemented by rotating pointers to the three 
buffers. 

The exact constants chosen for in the output rule are chosen to maxi- 
mize the girth and other useful properties of the graph of dependencies between 
internal variables and the output, which is available to the attacker. 

Rotations. 
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Another operation used both in the output rule and in the update of the 
internal state is bit rotation. The motivation behind this is as follows: all the 
operations used in MV3 except for the rotation (that is, bitwise XOR, modular 
addition and multiplication) have the property that in order to know the k least 
significant bits of the output of the operation, it is sufficient to know the k least 
significant bits of the input. An attacker can use this property to devise an attack 
based on examining only the k least significant bits of the output words, and 
disregard all the other bits. This would dramatically reduce the time complexity 
of guess- and-determine attacks. For example, if no rotations were used in the 
cipher, then a variant of the standard guess-and-determine attack presented in 
Section 0] would apply. This variant examines only the least significant byte of 
every word, and reduces the time complexity of the attack to the fourth root of 
the original time complexity. 

One possible way to overcome this problem is to use additional operations 
that do not have this problematic property, like multiplication in some other 
modular group. However, such operations slow the cipher significantly. The ro- 
tations used in Mv3 can be efficiently implemented and prevent the attacker 
from tracing only the several least significant bits of the words. We note that 
similar techniques were used in the stream cipher Sosemanuk |B+05| and in 
other ciphers as well. 

Key setup. Since the bulk of the internal state is the table T, we concentrate 
on intermingling T and the pair (key, IV). Once T is fully dependent on the key 
and the IV, the revolving buffers and other internal variables will necessarily 
follow suit. 

We have specified that the TV be as long as the key in order to prevent 
time/memory tradeoff attacks that try to invert the function g : (key, IV) — ► 
Output. The TV is known to the attacker but should not be easily predictable. 
One should avoid initializing the TV to zero at the beginning of every encryp- 
tion session (as is frequently done in other applications), since this reduces the 
effective size of the IV and allows for better time/memory tradeoff attacks. A 
more comprehensive study of the security of Mv3 with respect to time/memory 
tradeoff attacks is presented in Section 0] 

We note that the key initialization phase is relatively slow. However, since 
the cipher is intended for encrypting long streams of data, the fast speed of 
the output stream generation compensates for it. We note that since the TV 
initialization phase is also quite slow, the IV should not be re-initialized too 
frequently. 

4 Security 

mv3 is designed to be a fast and very secure cipher. We are not aware of any 
attacks on mv3 faster than exhaustive key search even for huge key sizes of more 
than 1000 bits (except for the related key attacks in Section EO>|) . but have only 
made security claims up to a 256-bit key size. In this section we analyze the 
security of MV3 against various kinds of cryptanalytic attacks. 
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4.1 Tests 

We ran the cipher through several tests. First, we used two well-known batteries 
of general tests. One is Marsaglia's time-tested DIEHARD collection |Mar97| . 
and the other is the NIST set of tests used to assess AES candidates |R+01| 
(with corrections as per IKl 111) II i. Both test suites were easily cleared by MV3. 

In light of attacks on the first few output bytes of RC4 |MS01IMir02| . the 
most popular stream cipher, we tested the distribution of the initial double words 
of MV3 (by choosing a random 160-bit key and generating the first double word 
of the output). No anomalies were found. 

RC4's key stream is also known to have correlations between the least sig- 
nificant bits of bytes one step away from each other Gol97 . Neither of the two 
collections of tests specifically targets bits in similar positions of the output's 
double words. To compensate for that, we ran both DIEHARD and NIST's tests 
on the most and the least significant bits of 32-bit words of the key stream. 
Again, none of the tests raised a flag. 

4.2 Time/Memory/Data Tradeoff Attacks 

There are two main types of TMDTO (time/memory /data tradeoff) attacks on 
stream ciphers. 

The first type consists of attacks that try to invert the function / : State — ► 
Output (see, for example, |BS00| ). In order to prevent attacks of this type, the 
size of the internal state should be at least twice larger than the key length. 
In mv3, the size of the internal state is more than 11,000 bits, and hence there 
are no TMDTO attacks of this type faster than exhaustive key search for keys 
of less than 5,500 bits length. Our table sizes are larger than what one may 
expect to be necessary to make adequate security claims, but we have chosen 
our designs so that we can keep our analysis of the components transparent, and 
computational overhead per word of output minimal. We intend to return to 
this in a future paper and propose an algorithm where the memory is premium, 
based on different principles for light weight applications. 

The second type consists of attacks that try to invert the function g : 
(Key, IV) — ► Output (see, for example, |HS05j ). The IV should be at least 
as long as the key - as we have mandated in our key initialization - in order 
to prevent such attacks faster than exhaustive key search. We note again that 
if the IV's are used in some predictable way (for example, initialized to zero 
at the beginning of the encryption session and then incremented sequentially), 
then the effective size of the IV is much smaller, and this may enable a faster 
TMDTO attack. However, in order to overcome this problem the IV does not 
have to be "very random" . The only thing needed is that the attacker will not 
be able to know which IV will be used in every encryption session. This can 
be achieved by initializing the IV in the beginning of the session using some 
(possibly insecure) publicly known pseudo-random number generator and then 
incrementing it sequentially. 
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4.3 Guess-and-Determine Attacks 

A guess- and-determine attack against RC4 appears in |K+98| . The attack, adapted 
to MV3, has the following form: 

1. The attacker guesses the values of all the 32 words in buffers A and B in 
some loop of mv3, and the values of j, c, x in the beginning of the loop. 

2. Using the guessed values of the words in B, the attacker traces the value of 
j during the whole loop. 

3. Using the output stream and the guessed values, the attacker traces the value 
of x during the whole loop. 

4. Using the update rule of x and the knowledge of j, the attacker gets the 
values of 32 words in the T array. If the attacker encounters a word whose 
value is already known to her, she checks whether the values match, and if 
not, discards the initial guess. 

5. The attacker moves on to the next loop. Note that due to the knowledge of 
buffer A and some of the words T[j], the attacker can trace the update of c 
and of the T register. 

6. Each "collision" in the T array supplies the attacker with a 32-bit filtering 
condition. Since the attacker started by guessing 66 32-bit words, finding 70 
collisions should be sufficient to discard all the wrong guesses and find the 
right one. In 10 loops we expect to find more than 70 such collisions, and 
hence 2 14 bits of key stream will be sufficient for the attacker to find the 
internal state of the cipher. 

7. Once the attacker knows the internal state, she can compute the entire out- 
put stream without knowing the key. 

However, the time complexity of this attack is quite large - more than 2 2000 , 
since the attacker starts with guessing more than 2000 bits of the state. Hence, 
this attack is slower than exhaustive key search for keys of less then 2000 bits 
length. 

4.4 Guess-and-Determine Attacks Using the Several Least 
Significant Bits of the Words 

Most of the operations in mv3 allow the attacker to focus the attack on the 
k least significant bits, thus dramatically reducing the number of bits guessed 
in the beginning of the attack. We consider two reasonable attacks along these 
lines. 

The first attack concentrates on the least significant bit of the output words. 
In this case, since the least significant bit of c is fixed to 1, the attacker can 
disregard c at all. However, in this case the attacker cannot trace the values of j, 
and guessing them all the time will require a too high time complexity. Hence, 
it seems that this attack is not applicable to MV3. 

The second attack concentrates on the eight least significant bits of every 
output word. If there were no rotations in the update and output rules, the 
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attacker would indeed be able to use her guess to trace the values of j and the 
eight least significant bits in all the words of the internal state. This would result 
in an attack with time complexity of about 2 600 . However, the rotations cause 
several difficulties for such an attack: 

1. Due to the rotations, the values the attacker knows after her initial guess 
are bits 24 — 31 of the words in buffer C, bits 16 — 23 of the words in buffer 
B, and bits — 7 of the words in buffer A (these are the bits that affect the 
eight lsb's of the output words). Yet the attacker still does not know the 
eight lsb's of the words in buffer B and hence cannot find the value of j. 

2. If the attacker rolls the arrays to the previous loop, she can find the eight 
lsb's of the words in buffer B. However, the attacker cannot use her guess 
to get information from the previous loop. In that loop she knows bits — 7 
of the words in buffer B and bits 16 — 23 of the words in buffer C, but in 
order to compare the information with the eight lsb's of the output stream, 
she needs bits 16 — 23 of the words in buffer B and bits 24 — 31 of the words 
in buffer C. Therefore, the guesses in consecutive loops cannot be combined 
together. 

Hence, it seems that both of the attacks cannot be applied, unless the attacker 
guesses the full values of all the words in two buffers, which leads to the attack 
described in subsection l4.3l (with a time complexity of more than 2 2000 ). 

4.5 Linear Distinguishing Attacks 

Linear distinguishing attacks aim at distinguishing the cipher output from ran- 
dom streams, using linear approximations of the non-linear function used in the 
cipher - in our case, the random walk. 

In |CHJ02j . Coppersmith et al. developed a general framework to evaluate 
the security of several types of stream ciphers with respect to these attacks. It 
appears that the structure of mv3 falls into this framework, to which CHJ02 
Theorem 6] directly applies: 

Theorem 1. Let e be the bias of the best linear approximation one can find for 
pairs Xi,Xi+i, and let An (a) be the number of equations of type \3.1\l that hold 
for the sequence y m ,y m +i, ■ ■ • • Then the statistical distance between the cipher 
and the random string is bounded from above by 



Note that for e -C 1/2, the bound (|4.1|) is dominated by the term with the 
smallest a, which equals to 12 in our case. Since the relation between Xi and Xj+i 
is based on a random walk, e is expected to be very small. Since the statistical 
distance is of order e 24 , we expect that the cipher cannot be distinguished from a 
random string using a linear attack, even if the attacker uses a very long output 
stream for the analysis. 




(4.1) 
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4.6 Related-Key Attacks and Key Schedule Considerations 

Related key attacks study the relation between the key streams derived from 
two unknown, but related, secret keys. These attacks can be classified into dis- 
tinguishing attacks, that merely try to distinguish between the key stream and 
a random stream, and key recovery attacks, that try to find the actual values of 
the secret keys. 

One of the main difficulties in designing the key schedule of a stream cipher 
with a very large state is the vulnerability to related-key distinguishing attacks. 
Indeed, if the key schedule is not very complicated and time consuming, an 
attacker may be able to find a relation between two keys that propagates to a 
very small difference in the generated states. Such small differences can be easily 
detected by observing the first few words of the output stream. 

It appears that this difficulty applies to the current key schedule of mv3. 
For long keys, an attacker can mount a simple related-key distinguishing attack 
on the cipher. Assume that keylength = 8192/t. Then in any step of the key 
initialization phase, every word of the key affects exactly t words in the T array, 
after which the main loop of the cipher is run eight times and the output stream 
is xORed (bit-wise) to the content of the T array. The same is repeated with the 
IV replacing the key in the IV initialization phase. 

The attacker considers encryption under the same key with two IVs that 
differ only in one word. Since the key is the same in the two encryptions, the 
entire key initialization phase is also the same. After the first step of the IV 
initialization, the intermediate values differ in exactly t words in the T array. 
Then, the main loop is run eight times. Using the random walk assumption, we 
estimate that, with probability (1 — i/256) 256 , each of the corresponding words 
in the respective T arrays used in these eight loops are equal, making the output 
stream equal in both encryptions. Hence, with probability (1 — i/256) 256 , after 
the first step of the IV initialization the arrays A, B, and C are equal in both 
encryptions and the respective T arrays differ only in t words. 

The same situation occurs in the following three steps of the IV initialization. 
Therefore, with probability 

(1 - t/256) 256 • (1 - 2t/256) 256 • (1 - 3t/256) 256 • (1 - 4t/2 56) 256 (4.2) 

all of the corresponding words used during the entire initialization phase are 
equal in the two encryptions. Then with probability (1 — 4t/256) 32 all of the 
corresponding words used in the first loop of the key stream generation are also 
equal in the two encryptions, resulting in two equal key streams. Surely this can 
be easily recognized by the attacker after observing the key stream generated in 
the first loop. 

In order to distinguish between mv3 and a random cipher, the attacker has 
to observe about 

M = (l-t/256)" 256 -(l-2t/256)" 256 -(l-3t/256)" 256 -(l-4t/256)" 256 -(l-4t/256)- 

(4.3) 
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pairs of related IVs, and for each pair she has to check whether there is equality 
in the first 32 key stream words. Hence, the data and time complexities of the 
attack are about 2 10 M. For keys of length at least 384 bits, this attack is faster 
than exhaustive key search. Note that (somewhat counter intuitively) the attack 
becomes more efficient as the length of the key is increased. The attack is most 
efficient for 8192-bit keys, where the data complexity is about 2 10 bits of key 
stream encrypted under the same key and 2 15 pairs of related IVs, and the time 
complexity is less than 2 32 cycles. For keys of length at most 256 bits, the data 
and time complexities of the attack are at least 2 618 and hence the related-key 
attack is much slower than exhaustive key search. 

If we try to speed up the key schedule by reducing the number of loops 
performed at each step of the key schedule, the complexity of the related-key 
attack is reduced considerably. For example, if the number of loops is reduced 
to four (instead of eight), the complexity of the related-key attack becomes 

M' = (l-t/256)~ 128 -(l-2i/256)~ 128 -(l-3i/256)~ 128 -(l-4t/256) _128 -(l-4</256)"' 

(4.4) 

In this case, the attack is faster than exhaustive key search for keys of length at 
least 320 bits. If the number of loops is further reduced to two, the complexity 
of the attack becomes 

M" = (l-i/256)" 64 -(l-2i/256)" 64 -(l-3</256)" 64 -(l-4t/256)" 64 -(l-4i/256)" 32 

(4.5) 

and then the attack is faster than exhaustive search for keys of length at least 
224 bits. 

If the key schedule is speed up by inserting the output of the eight loops into 
the T array, instead of xORing it bit-wise to the content of the T array (as was 
proposed in a previous variant of the cipher), the complexity of the related-key 
attack drops to 

M'" = ((1 - t/256)" 256 ) 4 (4.6) 

In this case, the attack is faster than exhaustive key search even for 256-bit keys. 

Hence, the related-key attack described above is a serious obstacle to speeding 
up the key schedule. However, we note that the related-key model in general, and 
in particular its requirement of obtaining a huge number of encryptions under 
different related- TV pairs, is quite unrealistic. 

4.7 Other Kinds of Attacks 

We subjected the cipher to other kinds of attacks, including algebraic attacks 
and attacks exploiting classes of weak keys. We did not find any discrepancies 
in these cases. 

5 Summary 

We have proposed a new fast and secure stream cipher, mv3. The main attributes 
of the cipher are efficiency in software, high security, and its basis upon clearly 
analyzable components. 
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The cipher makes use of new rapidly mixing random walks, to ensure the 
randomness in the long run. The randomness in the short run is achieved by 
revolving buffers that are easily implemented in software, and break short cor- 
relations between the words of the internal state. 

The cipher is word-based, and hence is most efficient on 32-bit processors. 
On a Pentium IV, the cipher runs with a speed of 4.8 clocks a byte. 
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A Appendix: Mathematical Background 

The good long term randomness properties of the internal state of mv3 are 
achieved by updates using rapidly mixing random walks. Actually, the walks are 
only pseudo-random since the cipher is fully deterministic, but we desire the 
update rule to be as close as possible to a random walk. In this appendix we 
recall some mathematics used to study random walks, such as expander graphs 
and the rapid mixing property. Afterwards, we describe two particular types of 
random walks used in the mv3 cipher: a well-known random walk in the additive 
group Z/2"Z, and a novel random walk that mixes addition with multiplication 
operations. 

A.l Rapidly Mixing Random Walks and Expander Graphs 

Recall that a random walk on a graph starts at a node zq, and at each step 
moves to a node connected by one of its adjacent edges at random. A lazy 
random walk is the same, except that it stays at the same node with probability 
1/2, and otherwise moves to an adjacent node at random. Intuitively, a random 
walk is called "rapidly mixing" if, after a relatively short time, the distribution 
of the state of the walk is close to the uniform distribution — regardless of the 
initial distribution of the walk. 

Next, we come to the notion of expander graph. Let T be an undirected 
fc-regular graph on N < oo vertices. Its adjacency operator acts on L 2 (T) by 
summing the values of a function at the neighbors of a given vertex: 

(Af)(x) = (A.l) 
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The spectrum of A is contained in the interval [— k, k]. The trivial eigenvalue 
A = k is achieved by the constant eigenvector; if the graph is connected then 
this eigenvalue has multiplicity 1, and all other eigenvalues are strictly less than 
k. A sequence of fc-regular graphs (where the number of vertices tends to infinity) 
is customarily called a sequence of expanders if all nontrivial eigenvalues A of all 
the graphs in the sequence satisfy the bound | A| < k — c for an absolute constant 
c. We shall take a slightly more liberal tack here and consider graphs which 
satisfy the weaker eigenvalue bound |A| < k — c(logiV) for some constant 
A>0. 

The importance of allowing the lenient eigenvalue bound |A| < k — c(log N)~ A 
is that a random walk on such a graph mixes in polylog(iV) time, even if A > 
0. More precisely, we have the following estimate (see, for example, |JM V05 
Proposition 3.1]). 

Proposition A.l Let r be a regular graph of degree k on N vertices. Suppose 
that the eigenvalue A of any nonconstant eigenvector satisfies the bound |A| < a 
for some a < k. Let S be any subset of the vertices of T, and x be any vertex in 
r . Then a random walk of any length at least log j^^ — starting from x will 
land in S with probability at least J^j> = ajrT- 

Indeed, with a = k — c(logiV) , the random walk becomes evenly distributed 
in the above sense after 0((logA^) A+1 ) steps. 

Next, we come to the issue of estimating the probability that the random walk 
returns to a previously visited node. This is very important for cryptographic 
purposes, since short cycles lead to relations which an attacker can exploit. The 
following result gives a very precise estimate of how unlikely it is that a random 
walk returns to the vertex it starts from. More generally, it shows that if one has 
any set <S* consisting of, say, one quarter of all nodes, then the number of visits 
of the random walk to this set will be exceptionally close to that of a purely 
random walk in the sense that it will obey a Chernoff type bound. This in turn 
allows one to show that the idealized cipher passes all the moment tests. 

Theorem A. 2 ('Gi98, Theorem 2.1]) Consider a random walk on a k-regular 
graph T on N vertices for which the second-largest eigenvalue of the adjacency 
operator A equals k — ek, e > 0. Let S be a subset of the vertices of T, and t n 
the random variable of how many times a particular walk of n steps along the 
graph lands in S. Then, as sampled over all random walks, one has the following 
estimate for any x > 0: 



Prob 



\S\ 

tn — n- r 

r 



> X 



< (i + Jf^e-VCWn). (A.2) 
10n/ 



Thus even with a moderately small value of e, the random walk avoids 
dwelling in any one place overly long. The strength of the Chernoff type bound 
(|A. 2|) is also useful for ruling out other substitutes for random walks because 
of their non-random behavior. For example, it has been shown by Klimov and 
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Shamir KS04] that iterates of their T-functions on n-bit numbers cycle through 
all n-bit numbers exactly once, whereas our random walks will have very large 
expected return times. 

In practice, algorithms often actually consider random walks on directed 
graphs. The connection between rapid mixing of directed graphs (with corre- 
sponding adjacency/transition matrix M) and undirected graphs is as follows. 
A result of J. Fill shows that if the additive reversalization (whose adjacency 
matrix is M + M l ) or multiplicative reversalization (whose adjacency matrix is 
MM ) rapidly mixes, then the lazy random walk on the directed version also 
rapidly mixes. From this it is easy to derive the effect of having no self- loops 
as well. Moreover, if the undirected graph has expansion, then so does the di- 
rected graph — provided it has an Eulerian orientation. It is important to note 
that this implication can also be used to greatly improve poorly mixing graphs. 
For example, we will present a graph in Theorem I A. 41 which involves additive 
reversalization in an extreme case: where the original graph is definitely not 
an expander (the random walk mixes only in time proportional to the number 
of vertices N), yet the random walk on the additive reversalization mixes in 
polylog(iV) time. 

Expander graphs are natural sources of (pseudo)randomness, and have nu- 
merous applications as extractors, de-randomizers, etc. (sec HLW06 ). However, 
there are a few practical problems that have to be resolved before expanders can 
be used in cryptographic applications. One of these, as mentioned above, is a 
serious security weakness: the walks in such a graph have a constant probability 
of returning to an earlier node in constant number of steps. It is possible to solve 
this problem by adding the current state (as a binary string) to that of another 
process which has good short term properties, but this increases the cache size. 
In addition, if the graph has large directed girth (i.e. no short cycles), then the 
short term return probabilities can be minimized or even eliminated. 

A.2 Additive Random Walks on Z/2 n Z 

Most of the random walks used in the cipher, namely the random walks used 
in the updates of j, x, and T, are performed in the additive group Z/2 n Z. The 
mixing properties of these walks can be studied using results on Cayley graphs 
of this group. In general, given a group G with a set of generators S, the Cayley 
graph X(G,S) of G with respect to S is the graph whose vertices consist of 
elements of G, and whose edges connect pairs (g,gsi), for all g G G and s, E S. 

Alon and Roichman AR94| gave a detailed study of the expansion proper- 
ties of abelian Cayley graphs, viewed as undirected graphs. They showed that 
X(G, S) is an expander when S is a randomly chosen subset of G whose size is 
proportional to log \ G\. More precisely, they have shown the following: 

Theorem A.3 ( \AR94\j ). For every < 5 < I there is a positive constant 
c — c(S) such that the following assertion holds. Let G be a finite abelian group, 
and let S be a random set of clog \G\ elements of G. Then the expected value of 



20 



Keller, Miller, Mironov, and Venkatesan 



The MVS stream cipher 



the second largest eigenvalue of the normalized adjacency matrix of A(G, S) is 
at most 1 — S. 

The normalized adjacency matrix is simply the adjacency matrix, divided 
by the degree of the graph. Thus, in light of the results of the last section, the 
proposition implies that these random abelian Cayley graphs are expanders, and 
hence random walks on them mix rapidly. 

Using second-moment methods it can be shown that the graph is ergodic (and 
also that the length of the shortest cycle is within a constant factor of log \T\) 
with overwhelming probability over the choice of generators. The significance of 
this is that we need not perform a lazy random walk, which would introduce 
undesirable short term correlations as well as waste cycles and compromise the 
cryptographic strength. 

In MV3, the rapid mixing of the random walks updating x, j and T follows 
from the theorem of Alon and Roichman. For example, consider the update rule 
of x: 

x<-x + T\j], 

The update rule corresponds to a random walk on the Cayley graph X(G,S) 
where G is the additive group Z/2 n Z and S consists of the 256 elements of the 
T register. Note that we have \S\ — 41og 2 (|G|). In order to apply the theorem of 
Alon and Roichman we need that the elements of the T array will be random and 
that the walk will be random, that is, that j will be chosen each time randomly 
in {0, ... , 255}. Hence, assuming that j and T are uniformly distributed, we have 
a rapid mixing property for x. Similarly, one can get rapid mixing property for 
j using the randomness of x. 



A. 3 Non-linear Random Walks 

In order to introduce some nonlinearity to the cipher, we use a multiplier c 
that affects the cipher output in a multiplicative way. The multiplier itself is 
updated using a nonlinear random walk that mixes addition and multiplication 
operations. The idealized model of this random walk is described in the following 
theorem: 

Theorem A. 4 Let N and r be relatively prime positive integers greater than 
1, and f an integer such that rf = 1 (mod A). Let T be the J^-valent graph on 
Z/AZ in which each vertex x is connected to the vertices r(x + 1), r(x — 1), 
fx + 1, and fx — I. Then there exists a positive constant c > 0, depending only 
on r, such that all nontrivial eigenvalues X of the adjacency matrix of T satisfy 
the bound 

\M < 4 - - c _ 9 , (A.3) 
1 1 ~ (log A) 2 v ; 

or are of the form 

X = 4cos(27rfc/A) fork satisfying rk = k (mod A). 

In particular, if N is a power of 2 and (r— 1, A) = 2, then T is a bipartite graph 
for which all eigenvalues not equal to ±4 satisfy L4..9|) . 
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The proof of the theorem can be found in Appendix [BJ The result means 
that for a fixed r, P is an expander graph in the looser sense that its eigenvalue 
separation is at least c/(logJV) 2 for N large. This is still enough to guarantee 
that the random walk on the graph mixes rapidly (i.e. in polylog(A^) time). 

We note that although we use an additive notation, the theorem holds for 
any cyclic group, for example a multiplicative group in which the multiplication 
by r corresponds to exponentiation (this is the non-linearity we are referring to). 
Also the expressions r(x ± 1), fx ± 1 may be replaced by r(x ± g), fxig for 
any integer g relatively prime to N. Additionally, the expansion remains valid if 
a finite number of extra relations of this form are added. 

We also note that it was observed by Klawe |Kla81) that graphs of the form 
described in the theorem cannot be expanders with a constant eigenvalue sepa- 
ration, i.e. the assertion of the theorem is false without the logarithmic terms in 
the denominator. Even so, this would not change the polynomial dependence of 
logiV in the mixing time, but only improve its exponent. 

The operation used in the mv3 cipher algorithm itself is slightly different: it 
involves not only addition steps, but also a squaring or cubing step. Though this 
is not covered directly the Theorem, it is similar in spirit. We have run extensive 
numerical tests and found that this operation can in fact greatly enhance the 
eigenvalue separation, apparently giving eigenvalue bounds of the form |A| < a 
for some constant a < 4 (Klawe's theorem does not apply to this graph). Thus 
the squaring or cubing operations are not covered by the theoretical bound (|A.3|> , 
but empirically give stronger results anyhow. 



B Appendix: Proof of Theorem I A. 41 



We begin with some considerations in harmonic analysis. We may write the 
adjacency operator on £ 2 (P) = L 2 (Z/NZ) as 

A = MP + P t M = (MP) + (MP)*, (B.l) 



where 

(Mf)(a) = /(o + l) + /(o-l) (B.2) 

and 

(Pf)(a) = f(ra), (P< f)(a) = f(fa). (B.3) 

The additive characters of Z/7VZ play an important role. They are indexed by 
integers k € Z/NZ as follows: 

X = Xk : a -> e 2 * ika ' N . (B.4) 



These characters are eigenfunctions of M with eigenvalue A x = x(l)+x(— 1), so 
that X Xk — 2 cos(2irk/N). Furthermore P\ — X r '> which means P\k — Xrk and 

P t Xk = Xrk- 
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The operator A is self-adjoint, so its spectrum may be analyzed by means of 
the Rayleigh quotient. To prove the theorem, it suffices to show the existence of 
a constant c > such that 



max 

u_Ll 



(Av,v) 



{v,v} 



s 4 - (i^ivr < R5 » 



Here 1 denotes the constant function on the graph, which is the trivial character 
Xo, and (v,w) — X)j=i u iW denotes the L 2 -inner product of functions on T. 
Every vector v G L 2 (r) has an expansion of the form v = ^2 c x ' X m terms 
of the basis of characters \k ! the condition that v _L 1 is simply equivalent to 
requiring that c xa = 0. 

Let us now calculate the inner products in i|B.5|) for v = 53 x #i c x ' X> using 
the fact that (x,x') = N if X — x\ an d otherwise. First, (v.v) — 7V|c x | 2 . 
As 

A Xk = MPxk + P'Mxk = Mxrk+P^kXk = KkXrk + hXrk , (B.6) 

Xk is an eigenfunction of A with eigenvalue 2A^ if k = rk (mod N). This accounts 
for the explicit eigenvalues which are mentioned in the statement of the theorem. 
We have that 

N-l N-l 

Av = ^2 c k \ rk Xrk + ^ c fc Afe Xfk , (B.7) 

k - 1 k = 1 

where we have set = c X(c for notational convenience. The inner product {Av, v) 
satisfies 

iV-l 



(AV,V) = ^ c kCe[Xrk(Xrk,Xe) + ^k(Xfk,Xl)} 

k,e=i 

N-l N-l 

= CkC^k^rk Ci C rt^rt ( B - 

k = l 1=1 

N-l N-l 

< M |Crfe| |Arfc| + MM|Arf|. 



fe = l £ = 1 

We are now reduced to a problem about quadratic forms. For 1 < k, I < N — 1, 
let 

|A fc | + |A*|, fc = r£ (mod N) and £ = rk (mod AT) 
. |A fc j, k = r£ (mod AT) and £ rfc (mod JV) , . 

~ ' |A^|, k^rt (mod N) and £ = rk (mod A) ^ ' 

0, otherwise. 

We need to show the existence of a constant c > for which 

N-l / iV-l 

E «WWW < ( 4 - (lo^J E »2 (B.10) 
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for any N — 1 real numbers y±, . . . , Un-x- Since the spectrum coming from the 
characters \k for which rk = k (mod N) has already been accounted form, we 
may assume yk — for such k, and modify ljB.91) so that 

a M = a e , k = if rk = k (mod N). (B.ll) 

For this we use the following inequality 



Lemma B.l (Proposition 8 in 'JM851) Let (a^) be a symmetric n x n real 
matrix whose entries are nonnegative. Let (7^) be an n x n real matrix with 
positive entries for which jijjji = 1. Then 



1 j<n J i<n 



(B.12) 



Since the proof is short, we have included it here. 

Proof: Since < (7 1 ^ 2 j/i ± 7 _1 ^ 2 2/j) 2 = ivf +l~ 1 Vj ^tyiUj, we may bound 



Y a vy*yj 

i,j<n 



< 



< 



- Y 2a u \Vi\ \Vj\ 

i,j <n 

2 Y (7i 

i,j<n 

Y a vmv\ 



^3 Vi 



Hi Vj) 



(B.13) 



□ 



Now we specify which 7y to use in bounding our sequence. (In what follows 
we closely follow the technique of Jimbo-Maruoka from a different example in 
JM85].) Given an element i G Z/NZ, we let ||i|| denote the distance from 
i to NZ. In other words, if i is represented by a residue between and N, 
\\i\\ = min{i, TV — i}. For s > 1 set 



1 



d 



{\ogNf 



(B.14) 



where d is a small constant (depending on r) which shall be chosen later. Given 
an integer m relatively prime to TV, we define s m to be the largest integer s such 
that r s divides ||2m||. Since ||2m|| < N/2, s = O(logiV) and a s > provided d 
is sufficiently small. We set jke = 1 except in the following cases: 





\\2k\\ < N/(2r) 


|]2*||>J\T/(2r) 


|[2*||<J\T/(2r) 


Jk,rk — a s k 

Irii = a" 1 


jrt,t = a s / 


\\2£\\>N/(2r) 




(no exceptions) 
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This satisfies the requirement that r ^h,i r 1l,h — 1- We will choose the constant d 
to be smaller yet so that each 7^ < 1 + \{1 — cos7r/(2r)), as we may do. To 
finish the proof we must now show the existence of a constant c > so that 



JV-l 

2_j ak ' 1 



lk,rk\Kk\ + lk,fk\^k\ < 4 



for each 1 < k < A — 1 which does not satisfy rk 
Case I: Assume that ||2fc|| > A/(2r). Then 



(log A) 2 
k (mod A). 



(B.15) 



k 

2nk/N 



A 


A 


A 


4r' 


~2 


Ar 


7T 


7T — 


7T " 


27' 


2r- 



u 



u 



AT A A 

1 , A 

2 Ar 4r 



7T 7T 

27' 2 ""27 



(B.16) 



and | A* | = 2| cos(2^)| < 2cos(^). Now the lefthand side of l|B~T5|) is bounded 
by (1 + 1(1 — cos ^)(2 + 2 cos(^)) = 4 — 4 sin(^) 4 , which is bounded away from 
4 by an positive constant depending only on r. 

Case II: Now assume that ||2fc|| < N/(2r), and that rk is not congruent to 
k modulo N. Using the trivial bound that |Afc|, |A r fc| < 2, the lefthand side of 
(|B . 1 5|> is bounded by 2(jk,rk +7fc,ffc)- Both cases in the first column of the table 
have 7fc , rfc = a Sk . H\\2fk\ \ > N/(2r), then j k) r k = 1 and l + a Sk < 2 — dj (log A) 2 , 



so that the bound in (|B.15J| is satisfied so long as c < 2d, which it may be chosen 
to be. 

The only remaining situation is when both ||2fc|| < N/(2r) and ||2ffc|| < 
N/(2r), where the left hand side of (jB.15(l is bounded by 2(a Sk + a~^). Let 
—N/ (2r) < to < N I (2r) be the integer congruent to 2fk modulo N, i.e. so 
that ||2ffc|| = \m\. Then rm = 2k (mod N). Yet since -N/2 < rm < N/2, 
\\2k\ \ = \rm\. We may assume that m/0, for otherwise 2k = 2rk = (mod N); 
this implies k = rk (mod N) if N is odd, and k = rk = N/2 if N is even (since 
then r is odd). Therefore r divides |2fc|| = \rm\ to exactly one more power than 
it divides ||2f£:|| = \m\. Thus Sfk = Sfe — 1. Now 



1 



(log A 7 ') 2 



1 



1 



s d 



O 



(logA) 2 

d 2 



O 



(log NY 



(log A) 2 VQogA) 2 
since s = 0(\og A). Therefore (a Sk + aj^ k ) equals 
d 



1 - Sk 



1 - Sfk 



d 



(log A) 2 
< 2 — (sfe - Sfk) 



(log A) 2 
d 

(log A) 2 



O 



(logNf 



which is smaller than 2 — c/(logA) 2 for some sufficiently small c > 0. This 
concludes the proof of Theorem I A. 41 
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C Appendix: Related Work 

Theoretically, the requirements for stream ciphers are well understood: crypto- 
graphically secure pseudo-random number generators (PRNG) exist if and only 
if one-way functions exist |H+99| , and such a generator would be ideal as a 
stream cipher. However such known constructions would yield prohibitively slow 
implementations in practice. The heart of such constructions involves a one-way 
function / and a hard-core bit extractor B(x). If / is based on an algebraic 
problem such as DLOG or factoring, the resulting cipher is quite slow even 
when B(x) is simple and constitutes outputting some bits of x. If / is based on 
block ciphers, then B(x, r) — parity (x Ar) is often based on the Goldreich-Levin 
theorem |GL89| . Computing this parity bit takes on the average ^ cycles, where 
n is the machine word size. One can speed this up with some precomputations 
and make it into a practical algorithm with provable properties (e.g. the VRA 
cipher ARV95 , which has the disadvantage of needing to store a large array of 
random bits.) 

Computerized methods for random number generation go back to von Neu- 
mann |Neu51| . Many designers of PRNGs used clever techniques to control corre- 
lations between adjacent outputs of their algorithms, but few generators needed 
it as badly as LFSR-based algorithms |Golo67j . Indeed, since the Berlekamp- 
Massey algorithm Mas69 efficiently determines the state of an LFSR of length 
n given only 2n bits, all LFSR-based constructions necessarily must hide the 
LFSR's exact output sequence. 

Historically, the first method to hedge LFSR's from the Berlekamp-Massey 
attack was due to Geffc Gcf73 . It combines outputs of three synchronously 
clocked LFSR's to produce one stream of output bits, using one of them as a 
multiplexer. This is a lossy combiner in the sense it outputs only one of three 
bits generated by the LFSR's. It was broken by Siegenthaler |Sie84| . who also 
broke another 3- way non- linear combiner of Bruer |Bru84j . Another attempt by 
Pless |Ple77| — to make use of non-linear J-K flip-flops to combine eight LFSR's 
into one key stream — was broken shortly thereafter |Rub79 . 

A recommended approach to designing LFSR-based ciphers is the shrinking 
generator CKM93 . It outputs only one quarter of its generated bits, but has 
proved to be secure after 10 years of wide use and extensive scrutiny. 

Most combiners considered in the LFSR literature are constructed from two 
building blocks: a (non)linear function that mixes inputs of several generators 
(this function may either be memoryless or stateful, though usually of very small 
memory), and a clocking rule that controls the clock of some LFSR's. None of 
them uses deep buffers or tries to space LFSR's outputs using schemes with 
guaranteed properties. For attacks on combiners with small memory (up to 4 
bits) see |Cou04) . 

A different approach for combining generators' outputs is called randomiza- 
tion by shuffling Knu97 Ch. 3.2.2]. Two algorithms popularized by Knuth are 
often used in modern generators: the "algorithm M" or MacLaren-Marsaglia 
algorithm MM65 , and the "algorithm B" or Bays-Durham algorithm |BD76| . 
Both are analogous to our proposal in the sense that they store the generator's 



2G 



Keller, Miller, Mironov, and Venkatesan 



The MVS stream cipher 



output in a buffer and output the stored elements out of order. The fundamen- 
tal difference — and source of weakness — of both algorithms M and B is that 
they only reorder elements without modifying them. We omit the details. For 
example, the Bays-Durham algorithm operators as follows: 

Bays-Durham Algorithm. 

Y is an auxiliary variable, T is the size of the buffer V, m is the range of the 
generator (X n ). Initially V is filled with T elements Xq,. . . ,Xt—i- Iterate the 
following: 

1. set j <- [TY/m\ . 

2. set Y <- V[j\. 

3. output Y . 

4. set V[j) <— next element of (X n ). 

Since the position of the output element is completely determined by the 
previous element, the construction does not improve cryptographic properties of 
the cipher. If Y is chosen by an independent process (as in the algorithm M), 
there is still a 1/T chance that two elements JQ and Xj+i will end up next to 
each other in the output sequence. More generally, the distance between Xi and 
Xj+i is distributed according to a geometric distribution and has average T. 
Depending on the generator, this property may be exploitable. 

Klimov and Shamir KS04 proposed a class of invertible mappings {0, 1}" — > 
{0, 1}" called T-functions that allow introduction of non-linearity using elemen- 
tary register operations (V, A, ©, *, +, — , x i— > x, x i— > —x, <C). The T-functions 
are particularly well suited for fast software implementations. An example of 
such a function is f(x) = x + (x 2 V 5) (mod 2"), for which the sequence Xi+i = 
f(xi) spans the entire domain in one cycle. Each iteration requires only 3 cy- 
cles. Nevertheless, by choosing n — 64 and outputting the top half of Xi (i.e. 
H{xi) =MSB32(xi)), they discovered that the resulting pseudo-random sequence 
passed the statistical test suite for AES candidates with significance level a = 
0.01, which is better than some of the AES candidates. Surprisingly, the best 
known cryptanalytic attacks take time 2 cn , where c is a constant. These at- 
tacks depend on using the structure of the iterated output: this structure is 
important for proving the properties of these functions, and slightly altering the 
construction would destroy the properties. These functions allow some of their 
parameters be chosen at random subject to certain constraints. 

The methods in this paper allow us to resist such attacks better, with minimal 
overhead, and extend the length of the underlying key for the stream cipher. We 
do not know how to extend the known attacks in this new model. 
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